Overview of the TREC 2010 Web Track

نویسندگان

Charles L. A. Clarke

Nick Craswell

Ian Soboroff

Gordon V. Cormack

چکیده

The TREC Web Track explores and evaluates Web retrieval technology over large collections of Web data. In its current incarnation, the Web Track has been active for two years. For TREC 2010, the track includes three tasks: 1) an adhoc retrieval task, 2) a diversity task, and 3) a spam task. As we did for TREC 2009, we based our experiments on the billion-page ClueWeb09 data set created by the Language Technologies Institute at Carnegie Mellon University. The TREC 2009 Web Track included a traditional adhoc retrieval task, employing topical binary relevance assessments and reporting estimated MAP as its primary effectiveness measure [4]. For TREC 2010, we modified this traditional assessment process to incorporate multiple relevance levels, which are similar in structure to the levels used in commercial Web search. This new assessment structure includes a spam/junk level, which also assisted in the evaluation of the spam task. The top two levels of the assessment structure are closely related to the homepage finding and topic distillation tasks appearing in older Web Tracks. The diversity task was introduced for TREC 2009 and continues in TREC 2010, essentially unchanged [4]. The goal of this diversity task is to return a ranked list of pages that together provide complete coverage for a query, while avoiding excessive redundancy in the result list. The adhoc and diversity tasks share topics, which were developed by NIST with the assistance of information extracted from the the logs of a commercial Web search engine [9]. Topic creation and judging attempts to reflect a mix of genuine user requirements for the topic. An analysis of last year’s results indicates that the presence of spam and other low-quality pages substantially influenced the overall results [7]. This year we provided a preliminary spam ranking of the pages in the corpus, as an aid to groups who wish to reduce the number of low-quality pages in their results. The associated spam task required groups to provide their own ranking of the corpus according to “spamminess”. Table 1 summarizes participation in the TREC 2010 Web Track. A total of 23 groups participated in the track, a slight decrease from last year, when 26 groups participated. Many of the groups participating the diversity task also participated in the adhoc task, but not vice versa. The spam task attracted only 3 participants, including one group that participated only in this task. Only one group, ICTNET, participated in all three tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Overview of the TREC 2010 Legal Track Notebook Draft 2010 . 10 . 25

The TREC 2010 Legal Track consisted of two distinct tasks: the learning task, in which participants were required to estimate the probability of relevance for each document, and the interactive task, in which participants were required to identify all relevant documents using a human-in-the-loop process. 2010 is the fth year of the legal track, the third year of the interactive task within the ...

متن کامل

Overview of the TREC 2013 Crowdsourcing Track

In 2013, the Crowdsourcing track partnered with the TREC Web Track and had a single task to crowdsource relevance judgments for a set of Web pages and search topics shared by the Web Track. This track overview describes the track and provides analysis of the track’s results.

متن کامل

University of Essex at the TREC 2010 Session Track

This paper provides an overview of the experiments we carried out at the TREC 2010 Session Track. We propose an approach for interpreting reformulated queries by using query expansions derived from anchor logs which we envisage to be a potential alternative to query logs. We show that expansion with terms or phrases extracted from anchor logs improves the retrieval performance over a search ses...

متن کامل

Webis at the TREC 2010 Sessions Track

In this paper we provide an overview of the Webis group’s two-phase approach to the TREC 2010 Sessions track. In a preprocessing phase the queries are segmented to highlight contained concepts. In the final retrieval phase we treat Carnegie Mellon’s ClueWeb search engine as a black box and apply the MAXIMUM QUERY framework.

متن کامل

Overview of the TREC 2013 Federated Web Search Track ( draft )

The goal of the TREC Federated Web Search track is to promote research related to federated search, in a realistic web setting. This overview paper discusses the main results of the FedWeb 2013 track. In this first year of the track, we focused on basic challenges in federated search: (1) resource selection, and (2) results merging. After an overview of the provided data collection and the rele...

متن کامل